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ABSTRACT 


s\ 


V 

— Consider' the  estimation  and  smoothing  problem  for  a  hierarchical 
Markov  process.  The  supremal  state  evolves  autonomously;  infernal 
dynamics  and  observations  may  be  statistically  dependent  on  the  supremal 
state.  This  class  of  processes  has  more  structure  than  a  general  Markov 
process;  the  implications  of  this  structure  are  developed  here.  Of 
special  interest  is  the  case  of  hybrid  systems,  where  the  supremal  state 
is  discrete  and  the  infernal  dynamics  are  linear  and  Gaussian.  This 
structure  commonly  appears  in  diverse  applications,  including  failure 
detection,  maneuvering  target  tracking,  and  digital  communications 
on  analog  channels.  It  is  also  the  structure  for  which  the  most  useful 
conclusions  can  be  drawn. 


OPTIMAL  SMOOTHING  AND  ESTIMATION  FOR  HYBRID  STATE  PROCESSES. 

I.  INTRODUCTION 

The  impending  availability  of  very  large  scale  integrated  circuits  gives 
us  the  opportunity  to  review  some  of  the  classical  approaches  to  control  and 
estimation  problems,  particularly  those  with  a  combinatorial  structure,  and 
reconsider  some  of  the  design  tradeoffs  inherent  therein.  This  new  technology 
provides  the  option  of  efficiently  implementing  algorithms  which  are  rather 
profligate  in  their  requirements  for  multiplies  and  adds,  provided  that  the 
algorithm  cam  be  decomposed  into  a  number  of  highly  structured  loci  of 
computation  with  relatively  loose  coupling  (in  terms  of  data  transfer) 
between  them.  Dynamic  estimation  problems  provide  a  source  of  such  algorithms, 
particularly  when  they  have  a  more  complex  structure  than  the  oft  discussed 
linear-Gaussian  case. 

The  special  structure  considered  here  involves  a  Markov  process  with 

state  space  X  which  can  be  decomposed  into  subspaces  x  X2 ,  and  where 

the  dynamics  on  X  are  independent  of  X«,  but  not  vice-versa.  The  observation 
""l  ^ 

space  Y  can  be  decomposed  compatibly.  This  structure  lies  at  the  heart  of 
several  important  applications,  particularly  in  hybrid  systems  where  X^ 
is  discrete  (modeling  failure  modes,  maneuver  modes,  or  digital  symbols) 
and  X.2  continuous  (modeling  system  dynamics,  target  trajectories,  or  channel 
dynamics,  respectively) .  These  Drohlems  ?re  usuallv  dominated  hv  t*e  entire 
discrete  state  sequence.  Many  ad  hoc  solutions  to  these  types  of  problems 
have  appeared  in  the  literature  (1-3,11),  where  approximations  are  required  in 
order  to  overcome  the  exponential  growth  of  the  set  of  discrete  state 
trajectories  as  the  time  horizon  of  the  problem  advances.  Now  that  combina¬ 
tional  problems  are  not  necessarily  computationally  unassailable,  it  is 


worthwhile  understanding  the  extent  to  which  these  problems  can  be  solved 
exactly.  While  the  result  may  still  be  unimplementable,  inclusion  of  conqautation- 
reducing  features  which  do  not  affect  performance  (and  which  do  exist) certainly 
provides  a  starting  point  for  other  modifications. 

This  paper  develops  optimal  methods  for  approaching  the  filtering  and 
smoothing  problems  for  systems  with  the  above  structure.  The  contributions 
are  of  two  types:  specific  techniques  for  reducing  the  complexity,  of  hybrid 
system  estimation  algorithms,  and  a  general  structure  for  approaching  this 
class  of  problems.  The  techniques  and  approach  seem  quite  helpful  in  designing 
algorithms  for  VLSI  implementation,  but  do  not  entirely  solve  the  problem. 

As  an  example  will  show,  the  specific  techniques  developed  here  may  reduce 
the  combinatorial  growth  of  a  problem  from  exponential  to  linear  (in  time) ; 
this  is  helpful,  but  still  not  practical,  and  approximations  must  also  be 
introduced.  Thus  a  prime  purpose  of  this  work  is  to  delimit  the  power  of 
exact  techniques,  and  create  a  framework  for  future  performance  analysis  of 
approximate  techniques. 

The  development  begins  with  a  formal  problem  statement,  followed  by 
the  derivations  of  optimal  filtering  and  smoothing  techniques  in  a  general 
setting.  These  are  then  specialized  to  the  linear-Gaussian  and  hybrid 
linear-Gaussian  cases.  In  the  latter  the  greatest  payoff  is  obtained;  an 
illustration  of  this  concludes  the  work. 


II.  PROBLEM  STATEMENT 


A.  Models 

Let  the  state  space  of  a  Markov  process  be  X  =  ^  x  X^.  is  the  state 

space  of  the  supremal  subsystem  which  evolves  autonomously;  X that  of  the 

infernal  subsystem  which  is  dependent  upon  the  value  of  the  supremal  state 

4 

x^(t).  Formally,  we  make: 

Assumption  1:  The  state  transition  probabilities  factor  as 

pU^t+l)  ,x2(t+l)  jx1(t),x2(t))=  pU^t+l)  |x1(t))p(x2(t+l)  |Xl(t)  ,x2(t)) 

□  (2-1) 

The  process  is  observed  via  the  space  Y  =  V  x  Y  ,  where  Y  contains 

X  ^  X 

observations  of  the  supremal  state  only,  and  Y2  of  the  joint  state.  Again, 
make 

Assumption  2:  The  observation  probabilities  factor  as 

P(yx(t) ,  y2(t)  Ix^t)  ,x2<t))=  p(y1(t)  !x1(t))p(y2(t)  jx^t)  ,x2(t)) 

O 

Implicit  in  the  above  are  the  usual  conditional  independence  assumptions 
for  a  Markov  process  ,  so  the  quantities  in  (2.1)  and  (2.2)  completely 
specify  the  system. 

We  will  be  interested  in  the  maximum  a  posteriori  (MAP)  estimates 
of  the  state  (filtering),  or  entire  state  trajectory  (smoothing),  conditioned 
on  an  sequence  of  observations  received  from  the  system.  Introducing  the 
notation 

4.  For  the  general  case  derivations  will  be  done  formally.  This  is  exact 
when  X  is  discrete,  and  when  all  invoked  distributions  exist  and  are 
well  defined. 


(2-3) 


?  •/** 


x.  (t)  e  x. 

i  —i 


v  (t)  e  y. 

i  -1 


HJ 


for  sequences  of  states  and  observations  (over  a  time  interval  Te{l, . . . ,t}) , 
the  problem  is 


re 


'V  a  » 

-V'j 


Assumption  3:  Find 

* 

a)  for  the  filtering  problem, the  state  x  (t)  which  maximizes 
P (x  (t)  |  Y  (t)) 

* 

b)  for  the  smoothing  problem, the  state  trajectory  X  (t >  maximizing 

p(X(t)  | Y(t) ) .  D 

This  is  the  general  problem*  Two  special  cases  which  are  of  interest  are  the 

linear-Gaussian,  and  the  discrete/linear-Gaussian  (hybrid)  structures.  In 

n . 

the  former,  assume  that  X^  =  TR  1 ,  and  that  the  system  dynamics  are 


linear  with  additive  white  Gaussian  driving  noise. 


Assumption  1L:  The  hierarchical  dynamics  are: 


(2.1)  becomes 


xL(t+l)  -  A_u  x^t)  +  wi<t) 


w1(t)~N(C,21)  (2-4) 

x2(t+l)  -  A  ^(t)  +A22X2(t)  +»2(t)  w2(t)~N(0,22)  (2-5) 

where  ^  and  are  positive  definite,  and  w^  and  w2  are  jointly 
independent  and  white. 

Q 

"*1 

Similarly,  the  observations  lie  in  =  E  ,  and 
Assumption  2L:  The  observation  equations  are 


y^t)  =  c11  x1<t)  +  vL(t) 


vx(t)  ~N(0,R1) 


(2-6) 


5.  Extension  to  time-varying  system  and  noise  matrices  is  str a i phtf orvard 
and  not  considered  here  for  notational  clarity.  A  number  of  other 
assumptions  may  be  relaxed;  the  purpose  here  is  to  develop  some  new 
structure  in  the  simplest  setting  possible. 
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?2(t)  =£2l*l(t)  +  -22  *2(t)  +  ^2(t)  v  (t)~N($,R  )  (2-7) 

where  R^  and  R^  are  positive  definite  and  and  are 
jointly  independent.  ° 

Thus  the  conditonal  distributions  for  the  linear  case  p(x^ (t+1) | x^ (t) ) ,  etc. 

in  (2-1)  and  (2-2)  are  all  multivariable  Gaussian  densities  with  means 

and  covariances  specified  by  (2-4)  -  (2-7). 

For  hybrid  models,  a  combination  of  discrete  and  continuous  dynamics 

exist.  The  supremal  system  is  discrete,  specifying  some  structural  mode, 

and  the  infernal  is  assumed  linear  -  Gaussian,  with  descriptive  matrices 

1  n  1 

dependent  upon  the  value  of  the  supremal  state.  Thus  =  {x1#...,x  }  , 

n2 

X2  ■  R  ,  and 

Assumption  1H:  The  dynamics  are  specified  by 


p(xx(t+l) |x1 ct) ) 

(2.8) 

x2(t+l)  =*A(xi(t))  x2(t)  +  w(t) 

w(t)~  N(0,Q(x1(t))) 

with  £  positive  definite.6 

□  (2.9) 

m  n»2 

Finally,  ,  *2  =  3R  ,  and 

Assumption  2H:  The  observations  for  a  hybrid  system  are  specified  by 

pfy^t)  Ix^t) ) 

(2.10) 

y2(t)  -  C(x^ (t) )  x2(t)  +  v (t) 

v(t)~N(0,R(Xl(t))) 

(2.11) 

with  R  positive  definite.6 

Q 

6.  Nonzero,  x^-  dependent  means  may  be  treated  by 
the  dependence  of  A  or  C  on  x^ (t) 

state  augmentation  and 
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These  are  the  three  classes  of  models  treated  in  sections  III-V, 
respectively. 

B.  Applications 

It  is  not  appropriate  to  completely  review  all  applications  involving 
filtering  and  smoothing  of  hierarchical  or  hybrid  Markov  processes  here. 
However,  some  sense  of  the  applications  to  which  these  techniques  may  apply 
serves  as  useful  motivation. 

Certain  failure  detection  and  identification  problems  [1]  are  naturally 
described  by  (2.8)  -  (2.11).  X2  represents  the  usual  states  of  a  dynamical 
system  operating  under  feedback  control,  describes  various  failures 
which  may  occur  in  actuators  and  internal  parts  of  the  system  (causing  changes 
in  A  or  £)  or  in  sensors  (appearing  in  c  or  R) .  As  long  as  the  causes  of 
failure  are  unrelated  to  the  dynamic  states  (e.g.  due  to  stressing  operation), 
the  above  model  applies.  Interest  typically  centers  on  determining  the  time 
and  type  of  failures;  estimation  of  x2  itself  is  often  a  secondary  goal. 

Maneuvering  and  multiobject  tracking  in  clutter  [2]  are  other  hybrid 
estimation  problems.  X2  represents  positions  and  velocities  of  objects  in 
a  region;  X^  may  indicate  maneuver  modes  [4J ,  target  identities, detection/ 
nondetections  due  to  environmental  effects  [5],  or  the  permutations  of 
sensor  returns  which  are  not  labeled  with  the  target  from  which  they 
originated  [61.  In  the  latter  case,  signature  information  derived  from 
the  sensed  waveform  would  be  modeled  in  Y^ ;  position  and  velocity  data 
by  Y2. 

Finally,  certain  communication  problems  exhibit  a  hybrid  structure. 

X.  may  represent  a  digital  source  (e.g.  of  a  pseudorandom  code),  and 
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2^  analog  channel  dynamics.  might  also  model  the  existance  of  bursty 
interference  which  effectively  set?  C  to  zero,  and  R  large,  intermittently. 

III.  THE  GENERAL  CASE 

This  section  develops  the  concepts,  notation,  and  basic  techniques 
for  optimal  filtering  and  smoothing  under  assumptions  1-3. 

A.  Filtering 

One  might  expect  that  the  hierarchical  structure  of  the  system  would 
lead  to  the  posterior  distribution  having  some  structure  such  as 

pfx^t)  ,x2(t)  |vi(t)  ,Y2(t) )  =  p(x1  (t)  jyi  (t) )  p(x2  (t)  ^  (t)  ,Y2  (y) ) 

3-1) 

If  this  were  so,  then  one  could  design  a  filter  for  the  supremal  system 
alone,  and  then  one  for  the  infernal  system  which  used  the  results  of  the 
supremal  filter  in  the  estimation  of  x2> 

Unfortunately,  this  is  not  the  case.  One  step  of  the  Bayesian 
estimator  is 

p(x1(t+l),x2(t+l)  |Y1(t+l),Y2(t+l))  =  p(y1(t+l)|x1(t+l))  p(y2(t+l)|x1(t+l),x2(t+l)). 

(3-2) 

l  p(x1(t+l)  |x1  (t) )  l  £>(x  (t+l)|  *i(t)»  x2(t))p(x  (t),x  (t)|Y  (t)Y  <t)) 
x^t)  x2(t)  "  ~  ^  X 

p(Y1<t+l) ,y2 (t+1) |Y1(t) ,Y2(t)) 

Note  that  even  if  the  conditional  distribution  at  time  t  had  a  separation 
property  such  as  (3-1),  it  would  be  lost  both  in  the  propagation  of  the 
dynamics  of  x2  and  in  the  update  with  y2>  The  intuition  behind  this  becomes 
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clear  in  an  extreme  example:  set  =  X2  and  the  infernal  dynamics  so  that 
x2(t)  =  x^(t).  Then  not  only  does  an  observation  y^  provide  direct 
information  on  x^,  but  also  about  x2» 

Therefore  the  filtering  solution  exhibits  no  special  structure  in  this 

case. 


B.  Smoothing:  Compact 

We  will  consider  two  approaches  to  the  smoothing  problem,  one  the 
usual  optimal  algorithm,  and  the  other  an  expanded  version  which  better 
permits  exploitation  of  the  hierarchical  structure  at  a  cost  of  increased 
computation.  This  section  treats  the  former;  section  C  the  latter. 

The  suitability  of  the  hierarchical  structure  to  the  smoothing  problem 
is  suggested  by  the  fact  that 


r„  p<xi'*2ivv  ■  <p<*1ix1>p«x>  • 


VX2 


max  {p(Y2|xi,X2)  p(X2|xi)}} 
X2 


This  is  a  direct  result  of  assumptions  1  and  2,  since  they  imply  : 
p(x1,x2)  =  p(x2jx1)  p(X1) 


p(y1,y2|x1,x2) 


=  pIyJx^  p(Y2|x1,x2) 


(3-3) 


(3-4) 

(3-5) 


Note  that  it  is  not  necessary  to  compute  p(Y^,Y  )  at  all:  (3-3)  suggests 
an  algorithm  by  which  the  best  X.,,  is  found  for  each  x^;  and  then  the 
best  X^  is  found.  Unfortunately,  x^(t)  and  X2(t)  are  elements  of  rather 


large  sets. 
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However,  this  has  not  yet  considered  the  Markov  structure  of  the 
problem,  which  is  essential  to  recursive  smoothing  techniques.  Consider 
the  smoothing  solution  on  X^  x  X^;  we  will  be  interested  in  determining 
if  (3-3)  affects  its  structure. 

Definition:  A  survivor  function  [ 71  s(x(t)lY(t))  is  defined  by 

s(x(t) | Y (t) )  =  max  p(Y(t) |x(t) ,X(t-l) )p(x(t) ,X(t-l) )  (3-6) 

X(t-l) 

□ 

Technically,  s  is  a  function  on  X  x  Yfc;  since  we  will  only  be 
interested  in  evaluating  it  along  a  particular  realization  of  the  output 
process  Y(t),  it  is  convenient  to  view  it  as  a  function  of  x(t).  It 
indicates  the  unnormalized  probability  of  the  most  likely  state  trajectory 
X (t)  which  terminates  in  x(t),  conditioned  on  the  observation  sequence  Y(t). 
Note  that  the  maximizing  X(t-l)  in  (3-6)  may  not  be  unique,  but  one  of  them 
may  be  selected  and  stored  for  each  x (t) .  This  permits  reconstruction  of 
the  entire  MAP  state  sequence  by  finding  x(t)  which  maximizes 
s(x(t) | Y (t ) ) ,  and  then  determining  the  X(t-l)  thus  associated  with  it. 

The  implications  of  Markov  structure  are  that  s  is  recursively 
computable . 

Tomma  1.  s(x(t)|Y(t))  may  be  computed  via 

s(x(t+l)  |  Y  (t ))  =  max  p(x(t+l)  |  x  (t))  s  (x (t)  |  Y  (t) )  (3-7) 

x(t) 

s (x (t+1) | Y (t+1) ) =  p (y ( t+1) | x (t+1) )  s(x(t+l) |Y(t))  (3-8) 

Proof :  Bayes'  theorem,  interchange  of  max  operations  with  functions 

not  of  the  same  variable,  and  the  Markov  assumptions: 

t 

p (Y (t)  | X (t) )  =  II  p(y  (s)  |x  (s) )  (3-9) 

s=l 
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p  (X  (t) ) 


t-1 

n 

S=1 


p(x(s+l) |x(s„]  p(x(l) ) 

D 


(3-10) 


If  s  is  replaced  with  -£n(s),  a  monotonic  operation,  and  the  resulting 
function  is  minimized,  the  Viterbi  algorithm  [7]  emerges. 

Computationally,  the  Viterbi  algorithm  is  relatively  simple,  requiring 
only  0 (N^N^)  operations  per  time  step  (for  discrete  X).  Memory  for  storing 
the  preceeding  trajectory  associated  with  each  x(t)  is  the  dominant  factor 
in  its  implementation.  As  in  the  filtering  problem,  however,  the 
hierarchical  structure  of  the  Markov  process  does  nothing,  in  general,  to 
simplify  the  algorithm  further.  Again,  the  case  where  x^t)  =  x2(t)  for 
all  t  generates  an  s(x|y)  which  is  diagonal  on  x  X2 ,  and  demonstrates 
the  lack  of  decomposition. 


C.  Smoothing :  Expanded 

An  algorithm  for  the  smoothing  problem  can  be  constructed  which  does 
exploit  the  hierarchical  structure,  but  at  a  great  increase  in  computational 
complexity.  As  such,  it  is  not  useful  for  general  problems  of  the  class 
considered  here,  but  it  will  be  the  key  to  the  structure  of  the  hybrid 
smoothing  probJ^m. 


Definition;  A  conditional  survivor  function  s(x2|x^,Y2)  is  defined  as 


s(x2(t) | Xx (t) ,Y2(t)) 


max  p(Y  (t) |x  (t) ,X  ( t) )  p(X ,(t)|x  (t) ) 
x2(t-i) 


This  function  is  an  intermediary  in  the  solution  of  the  smoothing 
problem,  as  the  second  maximization  in  (3-3)  can  be  rewritten  as 


max  {p(Y  (t)|x.(t),X  (t))p(X  (t)|x  (t))}  =  max  s  (x2  (t)  |  Xx  (t)  ,Y2  (t) ) 
X2(t)  x2<t) 


(3-12) 
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These  equations  suggest  an  algorithm  whereby  s j , Y2 )  is  computed,  based 
only  on  f°r  ®ach  X^.  The  result  may  be  summarized  in  the  function 

r(XL(t)  | Y2 (t) )  =  max  s  (x2  (t)  |  Xx  (t)  ,Y2  (t) )  (3-13) 

x2(t) 

The  outer  maximization  in  (3-3)  is  then  over  the  product  of  pfX^jY^)  p(x^),  which 

is  computable  iust  from  the  structure  of  the  sumnremal  system,  and 
r(XjjY2),  derived  from  the  infernal  structure  only. 

This  algorithm  does  capitalize  on  the  hierarchical  structure,  but 
leaves  two  questions  to  be  answered.  First,  can  the  s(x  |x  ,Y  )  be 
computed  recursively?  Second,  is  there  some  recursive  structure  which  can 
be  exploited  in  the  outer  maximization,  over  X^,  without  reducing  the 
solution  to  a  Viterbi  algorithm  on  x  X2?  The  answer  to  the  latter  is 
particularly  critical,  as  the  size  of  grows  exponentially  with  time. 

The  answer  to  both  questions  is  yes.  Consider  the  computation  of 
s(x2|xifY2)  first. 

Lemma  2;  s(x2|x^,Y2)  may  be  computed  as 

a)  predict: 

s(x2(t+l) |xi(t) , Y2  < t) )  =  max  p (x2 (t+1) j (t) ,x2 (t) ) 

x2(t) 

s(x2(t) |x1(t) ,Y2(t))  (3-14) 


b)  update : 

s(x2  (t+1)  |xx  (t+1)  ,Y2  (t+1) )  =  p(y2  (t+1)  ) (t+1)  ,x.,  (t+1) ) 

s(x2(t+l) |xi(t) ,Y2(t)) 
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Proof;  Identical  to  Lemma  1,  with  conditioning  on  X  . 

1  O 

The  structure  of  these  computations  is  straightforward.  For  each  supremal 
trajectory  X^(t),  (3-14)  and  (3-15)  implement  a  Vitebri  calculation  for  the 
survivor  function  on  x^.  Note  that  the  explicit  conditioning  on  removes 
the  coupling  between  the  statistics  of  x^  and  the  supremal  observations  Y^; 

Xx  provides  a  more  complete  statistical  specification  of  the  evolution  of  x2 
than  does  Y^; 

Note  that  s  (x2 |  X^ , Y^)  and  p(x2|x^,Y2)  convey  very  different  things. 

The  most  likely  single  trajectory  to  consistant  with  X^  and  Y2,  is 
captured  by  s;  p  gives  the  aggregate  probability  of  being  in  x2>  If  one 
state  in  x2  can  be  reached  by  a  number  of  individually  low  probability 
paths,  while  another  can  by  a  single  high  probability  path,  the  s  and  p 
will  generally  be  of  quite  different  character. 

t 

The  best  visualization  of  this  structure  is  to  view  x^  as  a  tree, 
rooted  at  the  time  zero  at  a  single  point,  and  with  each  node  representing 
a  state  sequence  over  a  period  of  time.  From  each  node  branches  lead  to 
all  states  which  may  be  reached  as  the  system  extends  that  state  sequence 
by  one  time  step.  Computation  of  s(x2|X^,Y2)  involves  running  a  Vitebri 
algorithm  along  each  branch  of  that  tree. 

Clearly  this  becomes  cumbersome  as  t  grows  large.  One  can  then 
consider  the  outer  optimization  in  (3-3)  as  a  means  for  pruning  the  tree 
of  X^  sequences.  However,  this  must  be  done  carefully;  it  is  not 
appropriate  to  merely  eliminate  sequences  in  X^  merely  because  they  have 
low  probability  at  time  t.  (For  example,  there  may  be  only  one  sequence  X^, 

it 

albeit  of  low  probability,  which  enables  x2(t)  to  enter  some  state  x2; 

* 

if  a  subsequent  value  of  y2  indicates  that  x2 (t)  =  x2  with  probability  1, 
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it  would  be  helpful  if  that  X^  were  still  under  consideration) . 

In  fact,  the  tree  of  X^  sequences  may  indeed  be  pruned,  in  a  way 
which  guarantees  that  the  MAP  trajectory  will  never  be  eliminated  yet 
preserves  the  hierarchical  structure.  This  first  requires: 


Lemma  3:  pOY^lt) | X^ (t) )  pfX^ft))  may  be  computed  recursively: 
tp (Y^ (t+1) | Xj (t+1) )  p(X1(t+l)J  = 

=  pty^t+l)  jx^t+Diptx^t+l)  Ix^t))  [p(Y1(t)  { X1  (t) )  p  (Xx  ( t) )  ] 


Proof :  Elementary  manipulations  and  the  Markov  properties. 

Lemma  3  provides  for  the  computation  of  the  term  other  than  r (X^ | Y2> 
the  term  which  captures  the  supremal  dynamics  through  p (x^ (t+1) | (t) ) , 
and  the  supremal  observation  through  (t+1)  |x^  (t+1) ) . 

One  more  notion  is  needed. 


Definition :  The  sources  of  a  state  x^(t)  are  all  trajectories  in  X^ 
terminating  in  x^ . 


*  * 

Theorem  1:  Let  (X^  (t) ,  X2  be  the  MAP  trajectory  for  observations 


Y^ (t) , Y^ (t) .  Let  T  be  any  time  preceeding  t.  If,  at  time  T 


P(Y1(T)  l^l(T))  P<VT>>  S(X2(T)|X1(T),Y2(T)) 

<  pfY^r)  |x1(t))  pU^t))  s(x2(t)  |x1(t),y2(T)) 


(3-17) 


max 

{X1(T)} 


for  each  x2(X),  where  {>:^(T)}  contains  all  sources  of  x^(T)  except 
X^(T)  itself,  then  X^d)  will  not  be  a  subsequence  of  X^(t). 


16) 


Proof :  Consider  the  (compact)  smoothing  algorithm,  and  the 


s(x1(t)  ,  x2(t)|yi(t),  Y2<x))  computed  by  it.  Each  (x^x),  x2(x)) 
has  a  sequence  (X^fx),  X„,  (x) )  associated  with  it  which  constitutes 
an  optimal  trajectory  estimate  through  (x^Cx),  x2<t)).  If  X^(x) 
never  appears  as  the  first  component  of  one  of  these  associated 
sequences,  it  will  not  appear  as  a  subsequence  of  any  longer  tra- 
jectory.  X^(x)  can  only. appear  in  association  with  states  of  the 

/S 

form  (x^x),  x2  (x ) ) .  (3-17)  assures  that  there  is  no  x2(x)  for 

■V  */  /V 

which  X^(x)  is  the  most  likely  source  of  x^(x),  hence  X^(x)  may 
be  eliminated.  a 


Theorem  1  establishes  a  looser  requirement  for  eliminating  trajectories 
than  the  compact  smoothing  algorithm.  The  Viterbi  algorithm  will  eliminate 
trajectories  at  each  point  (x^x),  x2(x)),  leaving  only  one  candidate  termi- 
nating  there.  (3-17)  suggests  eliminating  X^ (x)  only  if  there  is  no 

/X  <« 

x2(x)  at  all  with  which  x^tx)  may  be  paired  and  which  preserves  X^fx)  as 
a  candidate.  An  even  looser  criterion  is  given  by 

V 

Corollary  la;  X^(x)  will  not  be  a  subsequence  of  the  optimal  estimate  if 

V 

there  exists  some  X^(x)  /  X^Cx),  both  sources  of  x^(x),  where 


p(X1(x)  U1(x) )  s(x2lx1(x),  Y2(x)) 

p(xx(x) | (x ) )  =s(x2lx1(x)  Y2(x)) 


(3-18) 


for  every  x2> 

Proof ;  (3-18)  implies  (3-17).  The  converse  is  not  true  as  the  maximizing 

X^(x)  in  (3-17)  may  vary  with  x2»  Q 


Thus  we  have  established  two  pruning  rules  for  the  x^  trajectories. 
Both  require  functional  dominance  between  two  scaled  versions  of  s(x„|x, ,Y  ) 
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to  hold  for  a  trajectory  X^  to  be  eliminated.  Both  are  weaker  than  the 
optimal  pruning  rules  on  x  X2  implied  by  the  Viterbi  algorithm,  as 
the  latter  are  pointwise  dominance  relations.  Thus  the  strength  of  the 
pruning  technique  has  been  sacrificed;  this  can  only  be  advantageous  if 
either  p(X^|Y^)  or  sU^X^Y.^)  has  a  particularly  convenient  form  compared 
to  s(x1,x2|y  ,Y  ) .  We  will  see  that  this  is  the  case  for  hybrid  state  models. 


IV.  LINEAR  -  GAUSSIAN  CASE 


Before  moving  to  the  hybrid  case,  the  relation  between  linear  filtering 
and  smoothing  algorithms  and  the  quantities  introduced  above  need  to  be 
established.  While  the  linear  case  exhibits  no  special  solution  structures 
as  a  result  of  assumptions  1L  and  2L,  the  development  here  is  necessary 
for  section  V. 


A.  Filtering 

The  solution  to  the  joint  filtering  problem  in  X  is  well  known  for 


the  linear  -Gaussian  case:  the  Kalman  filter  [8].  The  statistics 


x (t)  =  E{x(t) | Y(t) } 


p(t)  =  covlx(t)  Y (t) } 


(4-1) 


may  be  recursively  computed  as 


x ( t)  -  A  x(t-l)  +  K(t) (y (t)  -  C  A  x(t-l)) 


(4-2) 


P(t)  *  (I  -  K(t)  C] (A  P(t-l)  AT  +  £][I  -  K(t)CJT  +  K(t)  R  KT(t) 


K(t)  -  P(t-l)  CT  [C  P(t-l)  CT  +  R] _1 


(4-3) 

(4-4) 
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While  assumptions  1L  and  2L,  imply  a  block  triangular  or  diagonal  form  in 
A,  C,  R,  and  £,  this  is  not  reflected  i.n  the  propagation  of  P(t) ,  and  hence 
in  the  structure  of  the  algorithm.  Thti  reason  for  a  lack  of  separation  is 
(4-4);  the  update  gains  are  not  block  triangular  as  both  y^tt)  and  y2(t) 
convey  information  about  both  components  of  the  state,  just  as  in  section 
XXIA.  Thus  the  linear  -  Gaussian  assumption  does  not  allow  extra  structure 
to  become  apparent. 


Smoothinc 


We  will  consider  only  the  compact  smoothing  problem  here,  as  the 
set  X*  is  an  entire  N^t  dimensional  vector  space  which  cannot  be  profitably 
dealt  with  on  a  pointwise  basis.  Thus  we  will  specialize  Lemma  1  to 
this  case. 


Theorem  2:  Under  assumptions  1L  and  2L,  and  with  A,  C,  £  and  R  the 
matrices  which  can  be  partitioned  to  provide  A  ,  A^,  etc. 
s(x(t) | Y ( t) >  is  of  the  general  form 

A  A 

4(x(t)  -  x(t))T  p_1(t)(x(t)  -  X (t) ) 

s  (t)  e 
° 

with  the  parameters  x  and  P  computable  via  (4-2)  -  (4-4)  and  with 
sQ(t)  given  by; 

SQ(t)  =  (2tt)_N/2  { 2  Tt)  ~M/2  det(2)"1/2  (R)"1/2  SQ(t-l)  (4-5) 

J  (y(t)  -  c  A  x(t-l)T  s_1(t-l) (Y(t)  -  C  A  X(t-l) ) 
e 

S(t)  =  C [A  P(t-l)  AT  +  Q)  CT  +  R  (4-6) 


Proof ;  See  Appendix  A. 
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Four  points  are  important  about  those  equalities.  Foremost  is  the 

/\ 

fact  that  the  survivor  function  is  an  exponential  quadratic,  with  mode  x 
and  quadratic  coefficients  identical  to  those  of  the  conditional  state 
density  computed  by  the  Kalman  filter.  This  gives  a  convenient  double 

/v 

“► 

interpretation  to  x  and  P;  they  are  parameters  of  the  filter  solution,  or 
parameters  of  the  survivor  function.  This  coincidence  is  quite  special  to 
the  linear  case.  Secondly,  the  quadratic  shape  £  is  data  independent; 

A 

x  depends  on  the  observation  trajectory  so  s0  is  also  data  dependent, 
unlike  the  filtering  case.  Since  its  behavior  is  dominated  by  an  exponential 
quadratic  form  of  the  residuals,  sQ  provides  a  quantification  of  the  goodness 
of  fit  of  the  trajectory  to  the  actual  observations  (bigger  is  better) . 

* 

Finally,  this  is  only  half  of  the  smoothing  solution;  reconstruction  of  X 
from  the  mode  of  s(x|y)  can  be  done  in  the  usual  way  [9], 

Thus  the  survivor  function  for  the  linear-Gaussian  smoothing  problem 
can  be  parametrized  by  exactly  the  same  quantities  as  those  computed  by 
the  Kalman  filter,  plus  a  goodness  of  fit  measure  closely  related  to 
p(Y|x  ).  However,  since  the  Kalman  filter  does  not  lead  to  a  separation 
along  the  lines  of  the  hierarchical  structure  of  the  problem,  neither 
does  s(x| Y) . 

V.  HYBRID  CASE 

Now  we  turn  to  the  hybrid  system  case,  given  by  assumptions  1H  and  2H. 
The  set  of  supremal  trajectories  X  is  discrete,  so  they  can  be  viewed 
as  being  arranged  in  a  tree  as  in  section  III.  The  conditional  survivor 
function  S(x2|xi,Y2)  will  be  that  of  a  particular  linear-Gaussian  system 
with  time-varying  dynamics  specified  by  X^,so  the  results  of  theorem  2 
translate  to  it.  Thus  the  smoothing  solution  takes  the  form  of  a  bank  of 
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Kalman  filters,  one  for  each  X^,  with  some  supremal  logic  which  prunes 
elements  of  using  the  tests  of  theorem  1.  While  this  scheme  is  dominated 
by  the  combinatorial  size  of  X^,  we  will  see  that  this  same  structure  dominates 
both  the  filtering  solution  and  the  straightforward  Viterbi  algorithm  for 
hybrid  systems.  Only  the  expanded  smoothing  approach  of  section  IIIC  allows 
any  practical  reduction  in  the  size  of  X^^  on-line. 


A.  Filtering 


The  filtering  problem  for  a  hybrid  system  was  first  addressed  many 
years  ago  [10].  The  exact  solution  is  found  from  a  decomposition  much 
like  (3-3). 

p(x1(t),x2(t)  |Y1(t),Y2(t))  =  p(7~(t)tY  (t^)  l  P<Y1(t)  |xi(t))p(X1<t) ) 

1  2  X.  (t— 1) 


p(*2(t) \x1 (t) >  p(x2 (t) |xx (t) ,y2 tt) ) 


(5-1) 


This  expression  has  two  parts.  The  conditional  distribution 
p(x2(t) | (t) ,  Y2<t))  is  Gaussian,  since  Xx (t)  specifies  completely  the 
linear-Gaussian  dynamics  of  x2,  decoupling  it  from  .  The  remaining 
terms  form  a  set  of  weights,  so  that  the  resulting  conditional  distribution 
is,  for  each  x^(t),  a  weighted  sum  of  Gaussian  distributions  on  x2(t).  In 
general,  there  are  ^  components  in  each  weighted  sum,  each  corresponding 
to  one  element  of  X^(t-l).  The  only  time  this  size  is  reduced  is  if  two 
components  have  exactly  the  same  conditional  mean  and  covariance,  an  event 
that  does  not  happen  at  all  in  general. 

Unlike  the  general  case, and  the  linear  case,  the  structure  of  the 
optimal  state  estimator  forces  one  to  consider  expansions  over  X^ (t) . 
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This  is  because  the  conditional  distribution  p(x2(t)|  X^t),  Y2(t))  is 
conveniently  parameterized  by  its  mean  and  covariance,  but  sums  of  such 
distributions  can  only  be  expressed  in  terms  of  the  parameters  of  the 
components.  However,  the  opportunities  for  reducing  the  complexity  of 
this  expansion  are  almost  nonexistant,  and  this  is  the  point  at  which  engineering 
approximations  for  the  sake  of  implementation  are  usually  made  .  These 
approximations  generally  fall  into  two  categories:  pruning  ,  where  a  term 
in  the  expansion  is  dropped  completely  because  its  weight  is  small  relative 
to  others,  and  merging ,  where  two  or  more  terms  in  the  expansion  are 
replaced  by  a  single  "equivalent"  term,  where  "equivalent"  is  often  taken 
to  mean  "of  equal  conditional  mean  and  variance".  Criteria  for  determining 
candidates  for  pruning  or  merging  are  legion.  However,  all  have  some 
detrimental  impact  on  estimation  performance. 

B.  Smoothing 

The  smoothing  problem  has  a  structure  wherein  pruning  is  a  natural 
operation.  While  the  ideal  smoother  requires  a  survivor  function  which  has 
many  components  to  it,  each  being  a  weighted  Gaussian  shape,  the  combination 
of  components  is.  by  a  max  operator,  rather  than  a  sum.  Thus  some  compo¬ 
nents  may  in  fact  be  completely  dominated  by  others,  and  dropped  without 
affecting  the  selection  of  the  trajectory  estimate.  This  is  the  idea  behind 
optimal  pruning  of  trajectories. 

Consider  (3-3)  and  (3-12) : 


max 

X1'X2 


p(x1,x2|y1,y2 


P(Y 


1'V 


max 

X, 


x1)p(x1) 


max  (s(x2|xi,T2) }} 
X2 


(5-2) 
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From  Lemma  2  and  theorem  2,  s (x2  (t)  j (t)  ,H ^  (t) )  will  have  a  weighted 
Gaussian  shape;  hence  the  outer  maximization  is  over  a  set  of  Gaussian 
shapes  weighted  by  both  supremal  and  infernal  components.  It  is  conceivable 
that  some  terms  in  this  set  may  be  eliminated  by  the  criterion  stated 
in  Theorem  1. 

First  establish: 


Leamma  4:  In  the  hybrid  case,  sfx^X^Y^)  may  be  computed  by 

a)  predict: 

A  A 

x(t+l|t)  =A(x1(t))  x (t j t)  (5-3) 

P(t+l|t)  =  A(x1(t))  P(t|t)  AT(Xl(t))  +£(Xl(t))  (5-4) 

"V2  1  /? 

SQ(t+l|t)  =  (2tt)  det  (£(x  (t)))  '  so(t|t)  (5-5) 

b)  update 

/N  A  /V 

x(t+l|t+l)  =  x(t+l|t)  +  K(t+1) [y(t+l)  -  C(x  (t+l))x(t+l|t)] 

(5-6) 

P(t+l|t+l)  =  [I  -  K(t+l)C(x1(t+l))]P(t+l|t)  [I  -  Kft+DCU^t+l))] 
+  K(t+1)  RU^t+l))  (t+l)KT(t+l)  (5-7) 

'M2/2  1/2 

SQ(t+l|t+l)  =  (2tt)  det  R(x^  (t+1)  )~'l//  SQ  (t+1 1 1) 

A 

-  -|[y(t+l)  -  C(x1(t+1)  )x(t+l  |t)]T  s;”1  (t+1) 
e 

[y (t+1)  -  C(x  (t+l))x(t+l ) t) 1  (5-8) 

S(t+1)  =C(x1(t+l))  P(t+l|t)  CT(x1(t+l))  +R(x1(t+1)) 


(5-9) 
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where 

A  A 

x(t|x)  =x(t|xi(T),  y2<t))  (5-10) 

etc. 

Proof :  Apply  Theorem  2  to  the  recursion  of  Lemma  2,  conditioning  on  X^. 

This  structure  of  s(x2lY1»Y^  indicates  that  a  strict  Viterbi  algorithm 

on  X  necessarily  involves  a  parametrization  which  is  based  on  trajectories 
X^  .  Thus  the  compact  smoothing  algorithm  of  sec  .ion  IIIB  is  no  simpler 
them  the  expanded  structure  of  IIIC  in  this  hybrid  state  case. 

Definition:  A  quality  function  q (xj  X^ ^ ,Y„)  is  given  by 

q(x2,X1| Y1,Y2)  =P(Y1|X1)  p(X1)  s(x2|X1>Y2)  (5-11) 

O 

In  the  hybrid  case,  q(x2x1| Y^ ,Y2)  is  a  scaled  Gaussian  with  mode  and 
quadratic  weights  given  by  Lemma  4,  and  with  scale  factor 

qo(xilvV  =  so(W  p(Yilxi)  p(xi}  (5-12) 

where  the  latter  two  terms  may  be  recursively  computed  via  Lemma  2. 

The  crux  of  the  expanded  smoothing  algorithm  in  the  hybrid  case  is: 

Theorem  3:  A  supremal  trajectory  X^(i)  will  never  be  a  subsequence  of 

*  * 

an  optimal  trajectory  estimate  (X^t),  X2(t)),  t  >  t,  if  there 
exists  another  X^T)  ?  X^T)  which  is  a  source  of  x^t)  and  for 
which 

q(x2(T),X1(T)|Y1(T),Y2(T))  <  q  (x2  (x  )  ,X±  (x  )  |  (t  )  ,Y2  (T  )  )  (5-13) 

for  all  values  of  x2(t).  This  inequality  holds  iff 


k  *k 


Sr-.v 

£*."• 
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p"1(t|t)  -  p_1(t|t)  >o  (5-14) 

i(x2(t|T)  -  X2(t|t))T  [P  (  T  j  T>  -  P  ( T  |  T)  ]  1  (x2(t|t)  -  X2(t|t))  ^ 

Sin  qo(X1(T)|  Y1(t)#Y2(t)/-  Jin  q^Xj^  (x)  |  *1  (t)  ,Y2  (T) ) 

(5-15) 


Proof :  (5-13)  is  a  restatement  of  corollary  la.  The  equivalence  of 

(5-13)  to  (5-14)  -  (5-15)  is  shown  in  Appendix  B. 

O 

The  interpretation  of  these  conditions  is  interesting.  Figure  la 

illustrates  a  case  where  the  q  associated  with  X^  allows  it  to  be  eliminated 

in  favor  of  X^.  (5-14)  requires  that  the  conditional  Fisher  information 

matrix  of  a  pruned  trajectory  be  greater  than  that  of  the  one  that 

dominates  it;  Figure  lb  shows  that  violation  >f  this  inequality  will  lead 

to  q  dominating  q  on  the  tails  of  the  distrib  it ions.  Thus  trajectories 

with  good  conditional  information  may  be  eliminated  in  favor  of  those  with 

poorer  information,  but  not  vice-versa;  this  imparts  a  natural  conservatism 

to  the  pruning.  For  cases  which  satisfy  (5-14),  and  for  a  given  x2» 

%  ^ 

(5-15)d«termines  an  ellipsoidal  region  wherein  x2  may  lead  to  pruning  X2« 
Note  that  (5-14)  ensures  that  the  left  hand  side  of  (5-15)  will  always 
be  nonpositive,  hence  if 


9o«llVV 

'  1 


(5-16) 


then  this  ellipsoid  will  be  empty.  (Figure  lc)  .  (Note  that  (5-16)  can 
be  interpreted  as  a  likelihood  ratio  test  on  the  hypotheses  that  X^  or  X^ 


is  the  true  trajectory) .  Even  if  (5-16)  is  satisfied,  if  the  offset 


(a)  X,  pruned 


)  violated 
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bet  ween  the  conditional  means  is  too  large,  no  elimination  can  take  place 
(Figure  Id) . 

Since  theorem  3  is  based  on  corollary  la,  it  is  not  as  complete  as 
possible.  There  may  be  cases  where  X^  is  dominated  by  neither  X^  nor 

II 

X^  alone,  but  is  dominated  by  the  max  of  their  respective  q  functions 

I  *  1 

(Figure  2)  (provided  x^(x)  =  x^(x)  =  x  (x)).  \Jhile  the  general  inequality 
of  theorem  1  may  be  applied:  prune  X^  if 

Vx2  q(x2|  X1#Y1,Y2)  <  max  qtxJX^Y^Y)  (5-17) 

X^  6  sources (x^) 

the  reduction  of  this  test  to  simple  algebraic  tests  such  as  (5-14)  - 
(5-15)  is  rather  cumbersome. 

Thus  the  hierarchical  structure  of  the  hybrid  state  dynamics,  coupled 
with  the  simple  parameterization  of  the  conditional  survivor  function, 
leads  to  a  hierarchically  structured  algorithm  for  £he  smoothing  problem. 
The  infernal  level  consists  of  a  Kalman  filter  computing  the  mode  and 
quadratic  spread  of  the  survivor  function,  and  a  scale  factor  calculation 
based  on  the  Kalman  filter  residuals  and  applicable  noise  covariances. 

The  supremal  logic  computes  conditional  probabilities  on  X^  based  on  Y^, 
and  then  prunes  away  some  possibilities  based  on  a  Viterbi-like  criterion 
posed  in  terms  of  functional,  rather  than  pointwise,  dominance. 

The  smoothing  problem  is  more  tractable  than  the  filtering  problem 
because  terms  in  a  max-of-weighted-  Gaussian  functions  may  be  dropped 
completely,  whereas  all  terms  in  a  sum-of-weighted-Gaussian  must  be 
obtained.  Thus  the  smoothing  problem  admits  a  simplification  in  the 
combinatorial  aspects  of  the  hybrid  problem  which  js  unavailable  in  a 


-25- 


filtering  approach.  However,  due  to  the  coincidence  of  the  shape  of  the 
conditional  survivor  function  s(x2|xi#Y2)  and  the  conditional  distribution 
p(x2|x^,Y2),  the  Kalman  filter  statistics  computed  by  the  infernal  algorithm 
can  also  be  interpreted  as  the  conditional  mean  and  covariance  of  x2  for 

the  trajectory  X^.  With  this  view,  the  output  of  the  algorithm  would  be 

*  * 

X^ ,  the  MAP  discrete  trajectory,  and  p(x2|x^,Y2>,  the  corresponding 

continuous  state  distribution.  This  type  of  output  may  be  quite  suitable 
for  maneuvering  target  tracking  and  communications  problems. 

Finally,  it  is  important  to  note  that  since  (5-15)  involves  means 
and  scale  factors,  which  are  data  dependent  quantities,  there  is  no  pre¬ 
determined  order  in  which  the  X^  trajectories  are  eliminated.  Algorithms 
for  which  the  pruning  logic  is  da|a  dependent  are  typically  quite  difficult 
to  analyze;  the  most  important  contribution  of  theorem  3  is  the  guarantee 

that  the  pruning  logic  stemming  from  it  will  never  increase  the  probability 

*  * 

of  error  in  the  determination  of  (X^ ,  X2 ) ;  it  is  optimal.  However,  that 

pruning  logic  is  generally  insufficiently  powerful  to  reduce  the  search 
* 

for  X^  to  manageable  sizes;  other  techniques  are  required  for  an  actual 
implementation . 


VI.  EXAMPLE 

A.  Problem 

Consider  a  simple  scalar  hybrid  system,  where  the  plant  dynamics 
are  fixed 

x2(t+l)  =  .99  x2(t)  +  w(t)  (6-1) 

with  Q  =  .035  so  that  x  is  normally  distributed  around  zero,  with 
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*1 

*2 

P 

q 

a 

0 

0.0 

1.1 

-17.80 

b 

1 

-.428 

.204 

-16.60 

c 

0 

-5.961 

.264 

-38.16 

d 

1 

-3.169 

.128 

-65.59 

e 

0 

-.308 

.378 

-15.48 

f 

1 

-.439 

.15 

-14.22 

g 

0 

-.267 

.277 

-13.09 

h 

1 

-.403 

.131 

-11.86 

i 

1 

-.343 

.096 

-9.52 

Table  Is  Description  of  Survivors 


variance  1.75  in  the  steady  state.  The  discrete  state  models  a  change  in 
the  sensor  structure;  if  =  1  (normal) 

y2(t)  =  x2(t>  +  v(t)  R(1)  =  ,25  (6-2) 

and  if  x^  =  0  (abnormal) : 

y2(t)  -  v (t)  R (0)  =  18  (6-3) 

Note  that  the  standard  deviation  of  the  prior  distribution  on  y2  in  state  0, 
3/2  ,  is  thrice  that  of  the  prior  on  y2  when  =  1  (in  the  steady  state )  • 

This  model  may  apply  in  cases  where  x2  is  a  plant  and  x^  models  a 
sensor  failure;  where  x2  is  an  object  and  x^  a  random  detection  process? 
or  where  x2  is  a  signal  and  x^  the  presence  of  interference.  In  all  cases, 
if  the  dynamics  of  x^  are  as  shown  in  Figure  3,  and  since  p  .1,  the 
failure  /  inference  process  is  "bursty":  it  has  memory  (with  an  expected 
holding  time  of  10/3  time  steps  in  state  0)  .  This  requires  an  algorithm 
for  smoothing  or  estimation  which  exploits  the  dynamics  of  x^  in  order 
to  perform  well. 

Figure  4  shows  a  typical  sample  path  of  the  hybrid  process  described 
above.  Figure  5  shows  the  corresponding  results  of  using  Theorem  3  to 
prune  the  tree  of  possible  X^.  Each  "x"  indicates  the  time  seep 
at  which  its  corresponding  trajectory  was  eliminated.  Note  that  when  all 
descendents  of  a  node  are  pruned,  that  node  itself  is  eliminated;  the  heavy 
lines  indicate  the  trajectories  which  are  still  candidates  at  time  7. 

While  there  are  trajectories  passing  through  both  x.^  =  0  and  x^  *  1  at 
times  1  and  2,  all  trajectories  pass  through  x^  =  0  at  times  3,4,  and  5. 
Refering  to  figure  4,  this  indicates  that  the  obvious  outliers  at  t=3 
and  t«5  have  been  confirmed  as  arising  from  state  0.  Note  that  y2(4) 
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has  been  (correctly)  determined  to  have  arisen  from  ■  0  through  the  menory 
of  the  supremal  process. 

There  is  a  reason  for  the  apparent  dominance  of  XjS  terminating  in  a 

run  of  x^(t)  *  0.  Consider  the  model.  If  x(t)  =0  then  the  Kalman  filter 

estimate  of  x is  not  updated,  and  the  associated  conditional  covariance  P 

is  larger  than  it  would  be  on  an  identical  path  except  with  x^(t)  *  1, 

-+ 

where  x^  would  be  updated.  Thus  the  covariances  along  trajectories  with 

many  x^(t)  =  0  will  be  larger  than  those  with  several  x^(t)  =  1;  the 

condition  (5-14)  gives  preference  to  the  former.  In  fact,  the  trajectory 

x^Ct)  =  0  will  never  be  eliminated,  for  this  reason;  thus  events  x^(t)  =  1 

will  never  be  confirmed.  Thus  the  optimal  algorithm  can  only  confirm  events 

where  x^(t)  =  0.  (Intuitively,  this  is  in  anticipation  of  the  possible, 

albeit  unlikely,  event  that  a  future  sequence  of  observations  will  fit 
the  dynamics  of  x2  perfectly,  but  for  an  initial  state  far  from  zero  . 

If  these  were  observed,  the  data  thus  far  would  be  confirmed  as  all  having 

come  from  the  interference.) 

Table  1  shows  the  parameters  describing  each  surviving  trajectory 

-►  I 

in  figure  5.  Most  represent  components  of  s(xj_»x  |Y^,Y2)  which  are 
~¥ 

clustered  near  x2  *  .35,  and  it  is  possible  that  the  general  mechanism 
of  theorem  1  might  eliminate  one  or  more  of  these  which  were  missed  by 
that  of  theorem  3.  Coincidentally,  the  survivor  corresponding  to  the 
true  trajectory,  i,  has  the  highest  quality  factor  q. 

Finally,  figures  6  and  7  show  the  effectiveness  of  the  optimal 
pruning  mechanism  over  a  longer  period  of  time.  Figure  6  compares, 
on  a  log  scale,  the  actual  number  of  survivors  against  the  total  size 
of  x*.  it  is  clear  that,  for  this  example  at  least,  the  exponential 


FIGURE  6:  Logarithm  of 


growth  of  candidate  X^s  has  been  averted.  Figure  7  shows  the  same 
count  on  a  linear  scale;  the  number  of  survivors  seems  to  stay  roughly 
constant  when  =  0  (as  many  trajectories  with  =  1  can  be  primed) 
and  to  grow  roughly  linearly  when  x^^  =  1.  In  particular,  the  jump  in  y^ 
at  t  *  19  causes  a  net  reduction  in  the  surviving  X^  at  t  =  20. 

Thus  the  optimal  pruning  mechanism,  while  not  complete,  is  still 
capable  of  significantly  reducing  the  combinatorial  aspect  of  the  hybrid 
smoothing  problem,  at  least  for  this  example.  A  general  categorization  of 
its  effectiveness  is  yet  to  be  determined. 

VII.  CONCLUSIONS 

In  conclusion,  this  work  has  presented  a  new  perspective  on  filtering 
and  smoothing  for  hierarchical  Markov  processes,  particularly  hybrid  state 
systems.  The  results  fall  into  two  categories.  The  negative  results  are 
that  the  hierarchical  structure  does  not  contribute  to  simplification  of 
the  solution  to  the  state  estimation  problem,  nor  to  the  trajectory 
estimation  problem  for  discrete  state,  or  linear  -  Gaussian,  problems. 

The  positive  results  are  related  to  the  hybrid  case,  where  both  state  and 
trajectory  estimation  are  dominated  by  a  structure  involving  combinations 
of  weighted  Gaussian  terms.  While  both  can  then  be  realized  by  separate 
confutations  of  the  weights  and  parameters  of  the  Gaussian  shapes,  only 
the  smoothing  problem  affords  us  the  opportunity  to  eliminate  some  of 
the  components  entirely.  This  simplification  of  the  combinatorial 


aspect  of  the  problem  suggests  adoption  of  the  trajectory  estimation 
approach  to  hybrid  systems,  particularly  in  light  of  the  relationship 
between  the  parameters  of  the  Gaussian  components  in  the  two  cases; 
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they  are  confuted  by  the  same  Kalman  filters. 

The  results  of  the  adoption  of  the  trajectory  estimation  viewpoint  is 
a  pruning  rule  which  is  optimal  in  a  well  defined  sense:  the  elimination 
of  a  trajectory  is  guaranteed  to  never  increase  the  probability  of  error  in 
estimating  the  discrete  state  trajectory.  An  example  showed  that  this  rule 
alone  can  be  effective,  but  that  some  other  selection  mechanism  is  required 
in  order  to  bound  the  number  of  survivors  at  a  finite  level. 

Computationally ,  the  structure  of  the  algorithm  described  in  Section  V 
is  ideal  for  VLSI  implementation.  The  infernal  calculations,  involving  Kalman 
filters  and  residual  computations,  are  completely  separate  from  one  another 
and  would  benefit  from  parallel  execution.  The  interconnection  between 
them  is  provided  by  the  (simple)  supremal  computation  involving  the  discrete 
observation,  and  the  pruning  mechanism.  The  latter  involves  iimple  exchange 
and  tests  of  the  results  of  the  separate  infernal  calculations,  and  thus  is 
a  relatively  loosely  coupled  mechanism. 

Thus  this  work  presents  a  new  approach  to  hybrid  state  tracking  problems. 
While  it  does  not  completely  specify  an  implementable  algorithm,  an 
approach  which  can  reduce  a  set  of  1,048,576  candidates  to  only  66  without 
increasing  the  probability  of  error  is  a  useful  first  step. 


This  gives  the  prediction  equations: 


x(t+l  jY(t))  =Ax(t|Y(t>) 


(A-7) 


P(t+1  | Y^t) )  =  A  P(t)|  Y(t))  AT  +  £ 


(A-8) 


sQ(t+i|y(t)) 


(2rr)"N/2  (det  Q)~  2  sQ(t|y(t)) 


(A-9) 


Now  (3-8)  updates  these  with  y(t+l): 


x+|Y(t+D)  =  (2*)“M/2  det(R) 


-  "  (y+  -  C  x+)T  R-1  (y+  -  C  x+) 


1  -N-  +  T  -I  + 

-■f(*  -  x)1  P  A  (X  -  X) 

so(t+l|y(t))  e 


(A-10) 


where  now 


X  =  X ( t+1 ) 


y+  *  y (t+1) 


(A-ll) 


x  *  X (t+1) I Y(t) ) 


Combining  and  completing  the  squares  in  (A-10)  gives 


s(x+|  Y(t+1))  =  s  (t+1)  |  Y (t) )  (2it)"M/2  det  (R) 


1++  ,  T  _T._%-1/++  . 

-1/2  "2 (y  “C**  (C  P  C  +R)  (y  -Cx) 


■■|-(x+  -  x+)  (CPC ^  +  R)  (x+  -  x+) 


(A-12) 


where 


-*•+  -*■  -*■  +  ->■ 
x  -  x  +  K (t+1)  (y  -  C  x) 


(A-13) 


This  gives  the  update  expressions: 


x(tn|y(t+l)  =  x*(t+l)|  Y  (t) )  +  K(t+1)  (y(t+l)  -  C  x(t-l|y(t))) 


(A-14) 


K (t+1)  =  P(t+l|Y(t))  CT[C  P(t+l|  Y(t))  CT+  Rl'1 


(A-15) 
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P(t+1) |Y(t+l)  =  [I  -  K(t+1) Cl  P(t+l|Y(t+l)) [I  -  K(t+1)C]T 
+  K.(t+1)  R  JC^t+l) 


s  (t+1)  |y(t+l))  =  s  (t+1)  |  V (t) )  (2tt)"M/2  det  (R)"1/2 


- ~(y (t+i) -c  x (t+i) | y (t+i) )T  s'1 (y (t+i)  -  c  X 

e 

s  -  c  p(t+i|y(t))  cT  +  r 

Note  that  (A-14)  -  (A- 16)  are  the  usual  Kalman  filter  equations 
(A-17)  -  (A- 18)  accumulate  the  effect  of  the  residuals 

A 

y (t+i)  -  c  x(t+l|y(t)) 

weighted  by  their  inverse  covariance  £. 

Combining  (A-17)  -  (A-9)  with  (A-14)  -  (A-18)  yields  (4-2)  -  ( 


□ 


(A-16) 

(A-17) 

(t+1) l Y(t) )  ) 
(A-18) 


(A-19) 

-6). 


APPEND IX  B 


Proof  of  Theorem  3 


We  seek  conditions  equivalent  to  the  statement 


q  e 


-f(x 


-*  »T  7i~  1  i*  "*■  »  1  -►  .  -1  /*•  +  , 

x2>  P  (X  -  x  )  -j(x  -  x2)  P  {x  -  x2) 

^  qQ  e 


(B-l) 


for  all  x. 

Taking  logarithms  and  rearranging  terms,  this  is  equivalent  to 


1 

2 


-*-T 

x 


A*_l 


1  X 


1  -*-T  ~-l 

-  2  x  fZ 


T  -*■ 


x 


(B-2) 


+ 


1  ->T 

2  X 


1  -►T  *1  ■+  ■  .  “  . 

2  X2  P  x2iln  %  '  ln  % 


On  the  left  is  a  quadratic  function  of  x.  It  will  be  bounded  below  by 
a  finite  constant  only  if 


P_1  -  P*1  >.0  (B-3) 

This  gives  (5-14) .  The  inequality  will  hold  iff  the  minimum  of  the 
quadratic  function  is  greater  than  the  right  hand  side.  That  minimum 
is  achieved^-  at 

x  a  [P  A  -  P  ]  [P  1  x^  -  P  x2l  (B-4) 

Using  the  fact 

_1  _  1  V  V  —  1 

[P-P]  =  P  -  P[P  -  Pi  P  (B-5) 

and  substituting  into  (B-2),  the  inequality  holds  iff 

j(x2  -  x2)T[£  -  P]"1  (x2  -  x2)  ■>  «,n  qQ  -  in  qQ  (B-6) 

This  is  (5-15) . 

i  This  assumes  (B-3)  is  strict.  If  not,  reformulate  this  entire  development 
in  the  largest  subspace  of  X2  on  which  (B-3)  holds  strictly. 
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